Climate change stands as one of the most discussed topics in the current news. Although it's since turned into a buzzword for you not being a "decent person" if you don't make changes to your personal life based on climate change, the vast majority of climate change stems from companies, farms, and transportation. As shown during the beginning of the COVID-19 pandemic when everyone engaged in the quarantine inside their own homes, the global footprint barely decreased.
Even though there is a big conversation about the switch from gas appliances and vehicles towards the electrical counterpart to reduce emissions, I instead wanted to explore the agricultural side of climate change.
In this project I will use data from Food and Agriculture Organization of the United Nations to conduct research. This project is to help share a better understanding on how agriculture affects our planet with two questions.
How has the temperature changed throughout the years?
Can we predict future emissions?
import pandas as pd #used for matplotlib graphs
import numpy as np #used for arrays to make dictionaries
import seaborn as sns #used for data visualization
%matplotlib inline
#used to see plots in jupyter notebook
import chart_studio.plotly as py #used for chloropleth maps
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.express as px
df_temp = pd.read_csv('FAOSTAT_temp_change.csv')
df_temp.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 162300 entries, 0 to 162299 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Domain Code 162300 non-null object 1 Domain 162300 non-null object 2 Area Code (FAO) 162300 non-null int64 3 Area 162300 non-null object 4 Element Code 162300 non-null int64 5 Element 162300 non-null object 6 Months Code 162300 non-null int64 7 Months 162300 non-null object 8 Year Code 162300 non-null int64 9 Year 162300 non-null int64 10 Unit 162300 non-null object 11 Value 156681 non-null float64 12 Flag 162300 non-null object 13 Flag Description 162300 non-null object dtypes: float64(1), int64(5), object(8) memory usage: 17.3+ MB
The data names and meaning in use for this study, Domain Code - the data set code name, Domain - what the data set pertains to, Area Code (FAO) - the country code, Area - the country name, Element Code - what the data set is calculating(either temperature change or standard deviation), Element - specifies temperature change, Months Code - code for the month, Months - shows the month name, Year Code - shows the year code, Year - shows the year name, Unit - shows which unit of measurement is used, Value - shows the temperature of both the country and year, Flag - shows the flag description code, Flag Description - shows the flag for the row(Fc: Calculated Data, NV: Data not available, NA: Not applicable)
df_temp.head()
| Domain Code | Domain | Area Code (FAO) | Area | Element Code | Element | Months Code | Months | Year Code | Year | Unit | Value | Flag | Flag Description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ET | Temperature change | 2 | Afghanistan | 7271 | Temperature change | 7001 | January | 1961 | 1961 | °C | 0.746 | Fc | Calculated data |
| 1 | ET | Temperature change | 2 | Afghanistan | 7271 | Temperature change | 7001 | January | 1962 | 1962 | °C | 0.009 | Fc | Calculated data |
| 2 | ET | Temperature change | 2 | Afghanistan | 7271 | Temperature change | 7001 | January | 1963 | 1963 | °C | 2.695 | Fc | Calculated data |
| 3 | ET | Temperature change | 2 | Afghanistan | 7271 | Temperature change | 7001 | January | 1964 | 1964 | °C | -5.277 | Fc | Calculated data |
| 4 | ET | Temperature change | 2 | Afghanistan | 7271 | Temperature change | 7001 | January | 1965 | 1965 | °C | 1.827 | Fc | Calculated data |
This data set consists of temperature variablitly ranging from several countries around the world, dating back from 1961. Looking over the data set, there isn't much to clean other than the year code due to its redundancy.
df_temp.drop('Year Code', axis=1, inplace=True)
df_temp.describe()
| Area Code (FAO) | Element Code | Months Code | Year | Value | |
|---|---|---|---|---|---|
| count | 162300.000000 | 162300.0 | 162300.000000 | 162300.000000 | 156681.000000 |
| mean | 130.647689 | 7271.0 | 7006.500000 | 1991.306248 | 0.493085 |
| std | 76.809078 | 0.0 | 3.452063 | 17.333268 | 1.114205 |
| min | 1.000000 | 7271.0 | 7001.000000 | 1961.000000 | -9.303000 |
| 25% | 64.000000 | 7271.0 | 7003.750000 | 1976.000000 | -0.103000 |
| 50% | 131.000000 | 7271.0 | 7006.500000 | 1992.000000 | 0.417000 |
| 75% | 194.000000 | 7271.0 | 7009.250000 | 2006.000000 | 1.031000 |
| max | 351.000000 | 7271.0 | 7012.000000 | 2020.000000 | 11.759000 |
Since I can already see that the minimum increase and the maximum increase are largely separated, along with the average estimated at a consistent growth, I chose to use a choropleth map to visualize the steep inclination.
fig=px.choropleth(df_temp,locations="Area", #used for the countries
locationmode="country names",animation_frame="Year", #tells the syntax that these are countries
animation_group="Area",color="Value", #uses the 'Year' column as video and uses the 'Value' column to show the variance
color_continuous_scale=["#E0FFFF", "#FF0000"] , hover_name="Area", #chose the color range from blue to red indicating temperature change
title = "Global Temperature Change")
fig.show()
Even though there were some errors due to the data set not having the correct syntax in the values making some countries blue instead of red, overall, the choropleth map shows how the globe increased in temperature over the years.
This indication of temperature change proves that further observation is needed to understand today's climate.
df_emm = pd.read_csv('Emissions_Totals.csv')
df_emm.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 13949 entries, 0 to 13948 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Domain Code 13949 non-null object 1 Domain 13949 non-null object 2 Area Code (FAO) 13949 non-null int64 3 Area 13949 non-null object 4 Element Code 13949 non-null int64 5 Element 13949 non-null object 6 Item Code 13949 non-null int64 7 Item 13949 non-null object 8 Year Code 13949 non-null int64 9 Year 13949 non-null int64 10 Source Code 13949 non-null int64 11 Source 13949 non-null object 12 Unit 13949 non-null object 13 Value 13949 non-null float64 14 Flag 13949 non-null object 15 Flag Description 13949 non-null object 16 Note 0 non-null float64 dtypes: float64(2), int64(6), object(9) memory usage: 1.8+ MB
The data names and meaning in use for this study, Area Code - the country code, Area - the country name, Item Code - the code name for emission sources, Item - the title of the emission source, Element Code - the code for the emission elements, Element - the name of the element emitted, Year Code - shows the year code, Year - shows the year name, Source Code - the code for source used for the data, Source - the title of the source used for data, Unit - the measurement type used in this data set, Value - the amount recorded, Flag - shows the flag description code, Note - additional information about each row
df_emm.head()
| Domain Code | Domain | Area Code (FAO) | Area | Element Code | Element | Item Code | Item | Year Code | Year | Source Code | Source | Unit | Value | Flag | Flag Description | Note | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | GT | Emissions Totals | 2 | Afghanistan | 7225 | Emissions (CH4) | 5058 | Enteric Fermentation | 2019 | 2019 | 3050 | FAO TIER 1 | kilotonnes | 389.6563 | Fc | Calculated data | NaN |
| 1 | GT | Emissions Totals | 2 | Afghanistan | 724413 | Emissions (CO2eq) from CH4 (AR5) | 5058 | Enteric Fermentation | 2019 | 2019 | 3050 | FAO TIER 1 | kilotonnes | 10910.3754 | Fc | Calculated data | NaN |
| 2 | GT | Emissions Totals | 2 | Afghanistan | 723113 | Emissions (CO2eq) (AR5) | 5058 | Enteric Fermentation | 2019 | 2019 | 3050 | FAO TIER 1 | kilotonnes | 10910.3754 | Fc | Calculated data | NaN |
| 3 | GT | Emissions Totals | 2 | Afghanistan | 7225 | Emissions (CH4) | 5059 | Manure Management | 2019 | 2019 | 3050 | FAO TIER 1 | kilotonnes | 26.1252 | Fc | Calculated data | NaN |
| 4 | GT | Emissions Totals | 2 | Afghanistan | 7230 | Emissions (N2O) | 5059 | Manure Management | 2019 | 2019 | 3050 | FAO TIER 1 | kilotonnes | 0.3654 | Fc | Calculated data | NaN |
This data set consists of varying emission types and amounts based on each country per year. Other than the repeating year column and note column, there isn't much else data to clean for now.
df_emm.drop('Year Code', axis=1, inplace=True)
df_emm.drop('Note', axis=1, inplace=True)
df_emm.describe()
| Area Code (FAO) | Element Code | Item Code | Year | Source Code | Value | |
|---|---|---|---|---|---|---|
| count | 13949.000000 | 13949.000000 | 13949.000000 | 13949.0 | 13949.0 | 13949.000000 |
| mean | 130.726145 | 387474.273568 | 13503.999642 | 2019.0 | 3050.0 | 1276.922239 |
| std | 76.730232 | 357616.933583 | 20592.883945 | 0.0 | 0.0 | 21896.693851 |
| min | 1.000000 | 7225.000000 | 5058.000000 | 2019.0 | 3050.0 | -651765.300100 |
| 25% | 65.000000 | 7230.000000 | 5062.000000 | 2019.0 | 3050.0 | 0.000300 |
| 50% | 129.000000 | 723113.000000 | 6750.000000 | 2019.0 | 3050.0 | 1.247100 |
| 75% | 194.000000 | 724313.000000 | 6993.000000 | 2019.0 | 3050.0 | 88.975500 |
| max | 351.000000 | 724413.000000 | 69921.000000 | 2019.0 | 3050.0 | 653402.772000 |
Since there are several types of emissions along with multiple sources of said emissions, I need to filter each emission type to get a better estimate of the mean, standard deviation, minimum and maximum numbers.
df_emm_ch4 = df_emm[df_emm['Element Code']==7225] #CH4 Emmissions (Methane)
df_emm_ch4.describe()
| Area Code (FAO) | Element Code | Item Code | Year | Source Code | Value | |
|---|---|---|---|---|---|---|
| count | 1856.000000 | 1856.0 | 1856.000000 | 1856.0 | 1856.0 | 1856.000000 |
| mean | 131.440733 | 7225.0 | 14021.238147 | 2019.0 | 3050.0 | 87.703295 |
| std | 76.694013 | 0.0 | 21047.855179 | 0.0 | 0.0 | 600.519128 |
| min | 1.000000 | 7225.0 | 5058.000000 | 2019.0 | 3050.0 | 0.000000 |
| 25% | 66.000000 | 7225.0 | 5060.000000 | 2019.0 | 3050.0 | 0.000000 |
| 50% | 130.000000 | 7225.0 | 6795.000000 | 2019.0 | 3050.0 | 0.142450 |
| 75% | 195.000000 | 7225.0 | 6993.000000 | 2019.0 | 3050.0 | 7.191150 |
| max | 351.000000 | 7225.0 | 69921.000000 | 2019.0 | 3050.0 | 14053.658400 |
df_emm_n2o = df_emm[df_emm['Element Code']==7230] #N2O Emmissions (Nitrous Oxide)
df_emm_n2o.describe()
| Area Code (FAO) | Element Code | Item Code | Year | Source Code | Value | |
|---|---|---|---|---|---|---|
| count | 2173.000000 | 2173.0 | 2173.000000 | 2173.0 | 2173.0 | 2173.000000 |
| mean | 130.722964 | 7230.0 | 15546.099862 | 2019.0 | 3050.0 | 4.385206 |
| std | 76.847692 | 0.0 | 22925.319603 | 0.0 | 0.0 | 25.818491 |
| min | 1.000000 | 7230.0 | 5059.000000 | 2019.0 | 3050.0 | 0.000000 |
| 25% | 65.000000 | 7230.0 | 5062.000000 | 2019.0 | 3050.0 | 0.001500 |
| 50% | 129.000000 | 7230.0 | 5066.000000 | 2019.0 | 3050.0 | 0.096700 |
| 75% | 194.000000 | 7230.0 | 6994.000000 | 2019.0 | 3050.0 | 1.222300 |
| max | 351.000000 | 7230.0 | 69921.000000 | 2019.0 | 3050.0 | 614.678800 |
df_emm_co2 = df_emm[df_emm['Element Code']==7273] #CO2 Emmissions (Carbon Dioxide)
df_emm_co2.describe()
| Area Code (FAO) | Element Code | Item Code | Year | Source Code | Value | |
|---|---|---|---|---|---|---|
| count | 992.000000 | 992.0 | 992.000000 | 992.0 | 992.0 | 992.000000 |
| mean | 131.002016 | 7273.0 | 13329.879032 | 2019.0 | 3050.0 | 1326.107959 |
| std | 76.921337 | 0.0 | 18674.642952 | 0.0 | 0.0 | 51278.042131 |
| min | 1.000000 | 7273.0 | 6750.000000 | 2019.0 | 3050.0 | -651765.300100 |
| 25% | 65.000000 | 7273.0 | 6751.000000 | 2019.0 | 3050.0 | 0.000000 |
| 50% | 129.000000 | 7273.0 | 6993.000000 | 2019.0 | 3050.0 | 0.000000 |
| 75% | 194.000000 | 7273.0 | 6994.000000 | 2019.0 | 3050.0 | 434.604650 |
| max | 351.000000 | 7273.0 | 67292.000000 | 2019.0 | 3050.0 | 653402.772000 |
Enteric Fermentation - Digestive process by which carbohydrates are broken down by micro organisms into simple molecules for absorption into the bloodstream of an animal. Greenhouse gas emissions from enteric fermentation consist of methane gas.
Manure Management - Refers to capture, storage, treatment, and utilization of animal manure. Greenhouse gas emissions from manure management consist of methane and nitrous oxide gases from aerobic and anaerobic manure decomposition processes.
Rice Cultivation - Agricultural practice for growing rice seeds. Greenhouse gas emissions from rice cultivation consist of methane gas from the anaerobic decomposition of organic matter in paddy fields.
Synthetic Fertilizers - Inorganic material of synthetic origin added to a soil to supply one or more plant nutrients essential to the growth of plants. Greenhouse gas emissions from synthetic fertilizers consist of the addition of nitrous oxide gas to managed soils.
Manure applied to Soils - Animal waste distributed on fields in amounts that enrich soils. Greenhouse gas emissions from manure applied to soils consist of nitrous oxide gas from manure added to managed soils.
Manure left on Pasture - Animal waste left on managed soils from grazing livestock. Greenhouse gas emissions from manure left on pasture consist of nitrous oxide gas.
Crop Residues - Agriculture management practice that consists in returning to managed soils the residual part of the produce. The associated greenhouse gas emissions are nitrous oxide gas from crop residues’ decomposition.
Burning - Crop residues - Agriculture management practice that consists in the combustion of a percentage of crop residues burnt on-site. Greenhouse gas emissions from burning crop residues are methane and nitrous oxide gases.
Net Forest conversion - The net forest conversion is calculated as the difference of forest area for two consecutive years, consistently with IPCC approach 1. The term “net” indicates that no further specification on the underlying dynamics of the computed land area change is possible. Greenhouse gas emissions consist of the net contribution of CO2 sources and sinks due to deforestation, reforestation and afforestation activities within countries.
fig=px.choropleth(df_emm,locations="Area", #used for the countries
locationmode="country names",animation_frame="Item", #switches from source of emission
animation_group="Area",color="Value", #uses the 'Value' column to show the variance
color_continuous_scale='reds' , hover_name="Area", #chose the range of red indicating volume
hover_data={'Element'}, title = "Emissions Totals for 2019")
fig.show()
Checking .info() again to see which column are objects
df_emm.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 13949 entries, 0 to 13948 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Domain Code 13949 non-null object 1 Domain 13949 non-null object 2 Area Code (FAO) 13949 non-null int64 3 Area 13949 non-null object 4 Element Code 13949 non-null int64 5 Element 13949 non-null object 6 Item Code 13949 non-null int64 7 Item 13949 non-null object 8 Year 13949 non-null int64 9 Source Code 13949 non-null int64 10 Source 13949 non-null object 11 Unit 13949 non-null object 12 Value 13949 non-null float64 13 Flag 13949 non-null object 14 Flag Description 13949 non-null object dtypes: float64(1), int64(5), object(9) memory usage: 1.6+ MB
For this part of the research I won't be needing columns that doesn't help with the prediction
df_emm.drop('Domain Code', axis=1, inplace=True)
df_emm.drop('Domain', axis=1, inplace=True)
df_emm.drop('Area Code (FAO)', axis=1, inplace=True)
df_emm.drop('Element Code', axis=1, inplace=True)
df_emm.drop('Item Code', axis=1, inplace=True)
df_emm.drop('Year', axis=1, inplace=True)
df_emm.drop('Source Code', axis=1, inplace=True)
df_emm.drop('Source', axis=1, inplace=True)
df_emm.drop('Unit', axis=1, inplace=True)
df_emm.drop('Flag', axis=1, inplace=True)
df_emm.drop('Flag Description', axis=1, inplace=True)
Since Area, Element, and Item are all categorical objects, these columns need transforming numerical variables for sklearn to understand them.
cat_feats = ['Area', 'Element', 'Item']
final_data = pd.get_dummies(df_emm, columns = cat_feats, drop_first = True)
final_data.head()
| Value | Area_Albania | Area_Algeria | Area_American Samoa | Area_Andorra | Area_Angola | Area_Anguilla | Area_Antigua and Barbuda | Area_Argentina | Area_Armenia | ... | Item_Forest fires | Item_Forestland | Item_Manure Management | Item_Manure applied to Soils | Item_Manure left on Pasture | Item_Net Forest conversion | Item_On-farm energy use | Item_Rice Cultivation | Item_Savanna fires | Item_Synthetic Fertilizers | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 389.6563 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 10910.3754 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 10910.3754 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 26.1252 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0.3654 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 259 columns
Here I will spilt the data into a training set and a testing set for predictions. For this project I chose Enteric Fermentation to train and split.
from sklearn.model_selection import train_test_split #used to split data
X = final_data.drop('Item_Enteric Fermentation',axis = 1)
y = final_data['Item_Enteric Fermentation']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3 , random_state= 50)
This part is to set up the decision tree for later predictions
from sklearn.tree import DecisionTreeClassifier #to create an instance and fit it with the data
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)
DecisionTreeClassifier()
Now to setup predictions and a confusion matrix
y_predict = dtree.predict(X_test)
from sklearn.metrics import confusion_matrix, classification_report
print(classification_report(y_test, y_predict))
precision recall f1-score support
0 0.99 0.99 0.99 4004
1 0.67 0.67 0.67 181
accuracy 0.97 4185
macro avg 0.83 0.83 0.83 4185
weighted avg 0.97 0.97 0.97 4185
All the initial scores (precision, recall, and f1-score) were at 99% The second iteration scores at 67% in precision, recall, and f1-score. The average scores are at 83%. This means that overall it was a good prediction.
print(confusion_matrix(y_test, y_predict))
[[3944 60] [ 59 122]]
The confusion matrix shows that there are 3,944 True Positive(TP), 60 False Negative(FN), 59 False Positives(FP), and 122 True Negatives(TN). There is more than double true negatives compared to either the false negatives or false positives.
In summary, agriculture is a huge component of environmental change because of the measure of GHG discharges created inside the homestead and on agrarian land. With the selection of Enteric Fermentation to train and split I found that the data set works better with the Random forest classification model compared to the single-descision tree.
Tubiello, F.N. 2019. Greenhouse Gas Emissions Due to Agriculture. In: Ferranti, P., Berry, E.M., Anderson, J.R. (Eds.), Encyclopedia of Food Security and Sustainability, vol. 1, pp. 196–205. Elsevier. ISBN: 9780128126875.
Prosperi, P., Bloise, M., Tubiello, F.N., Conchedda, G., Rossi, S., Boschetti, L., Salvatore, M. & Bernoux, M. 2020. New estimates of greenhouse gas emissions from biomass burning and peat fires using MODIS Collection 6 burned areas. Climatic Change 1–18.
Conchedda, G. and Tubiello, F.N. 2020. Drainage of organic soils and GHG emissions: Validation with country data. Earth System Science Data Discussions 2020, 1–47. https://doi.org/10.5194/essd-2020-202.
Tubiello, F. N., G. Conchedda, N. Wanner, S. Federici, S. Rossi, and G. Grassi. 2021. Carbon Emissions and Removals from Forests: New Estimates, 1990–2020. Earth System Science Data 13 (4): 1681–1691. https://doi.org/10.5194/essd-13-1681-2021.
Tubiello, F. N., Rosenzweig, C., Conchedda, G., Karl, K., Gütschow, J., Xueyao, P., Obli-Laryea, G., Wanner N., Yue Qiu S., De Barros J., Flammini A., Mencos-Contreras E., Souza L., Quadrelli R., Halldórudóttir Heiðarsdóttir H., Benoit P., Hayek M. and Sandalow D. 2021. Greenhouse Gas Emissions from Food Systems: Building the Evidence Base. Environmental Research Letters 16 (6): 065007. https://doi.org/10.1088/1748-9326/ac018e.